Chroma-key for efficient and low complexity shape representation of coded arbitrary video objects

ABSTRACT

A technique for implicitly encoding shape information by using a chroma-key color. A bounding box is created enclosing the video object. The bounding box is extended to be of size of next integer multiple of macroblock size and divided into a plurality of macroblocks. For each boundary macroblock, each pixel outside the object is replaced with the chroma-key color to implicitly encode shape information. Pixel data for boundary macroblocks and macroblocks inside the object are DCT transformed, scaled and motion compensated. A finer quantizer (smaller quantizer) is used for boundary macroblocks to improve image quality. A first_shape_code can be used to identify each macroblock as either 1) inside the object; 2) outside the object; or 3) on the object boundary. To improve data compression and achieve low complexity shape extraction with DCT and motion compensation, a first_shape_code is sent for all macroblocks, and only macroblocks that are inside the object or on the object boundary are coded. The decoding system decodes the first_shape_code and, if necessary, the DCT and motion compensation information. The motion compensated luminance and chrominance pixel values of a reconstructed object at the decoding system are compared to the chroma-key color and thresholds to reconstruct the shape of the object, and to output texture information of the video object.

CROSS-REFERENCE TO RELATED APPLICATIONS

[0001] This application claims priority from U.S. ProvisionalApplication Ser. No. 60/052971, filed on Jul. 18, 1997. This applicationis a continuation-in-part of co-pending application Ser. No. 08/801,716,filed on Feb. 14, 1997 entitled “Method and Apparatus for CodingSegmented Regions Which May Be Transparent In Video Sequences ForContent-Based Scalability,” incorporated by reference herein.

BACKGROUND OF THE INVENTION

[0002] The present invention relates to the field of digital videocoding technology and, more particularly, to a method and apparatus forproviding an improved chroma-key shape representation of video objectsof arbitrary shape.

[0003] A variety of protocols for communication, storage and retrievalof video images are known. Invariably, the protocols are developed witha particular emphasis on reducing signal bandwidth. With a reduction ofsignal bandwidth, storage devices are able to store more images andcommunications systems can send more images at a given communicationrate. Reduction in signal bandwidth increases the overall capacity ofthe system using the signal.

[0004] However, bandwidth reduction may be associated with particulardisadvantages. For instance, certain known coding systems are lossybecause they introduce errors which may affect the perceptual quality ofthe decoded image. Others may achieve significant bandwidth reductionfor certain types of images but may not achieve any bandwidth reductionfor others. Accordingly, the selection of coding schemes must becarefully considered.

[0005] The Motion Picture Expert Group (MPEG) has successfullyintroduced two standards for coding of audiovisual information, known byacronyms as MPEG-1 and MPEG-2. MPEG is currently working on a newstandard, known as MPEG-4. MPEG-4 video aims at providing standardizedcore technologies allowing efficient storage, transmission andmanipulation of video data in multimedia environments. A detailedproposal for MPEG-4 is set forth in MPEG-4 Video Verification Model (VM)5.0, hereby incorporated by reference.

[0006] MPEG-4 considers a scene to be a composition of video objects. Inmost applications, each video object represents a semanticallymeaningful object. Each uncompressed video object is represented as aset of Y, U, and V components (luminance and chrominance values) plusinformation about its shape, stored frame after frame in predefinedtemporal intervals. Each video object is separately coded andtransmitted with other objects. As described in MPEG-4, a video objectplane (VOP) is an occurrence of a video object at a given time. For avideo object, two different VOPs represent snap shots of the same videoobject at two different times. For simplicity we have often used theterm video object to refer to its VOP at a specific instant in time.

[0007] As an example, FIG. 1(A) illustrates a frame for coding thatincludes a head and shoulders of a narrator, a logo suspended within theframe and a background. FIGS. 1(B)-1(D) illustrate the frame of FIG.1(A) broken into three VOPs. By convention, a background generally isassigned VOPØ. The narrator and logo may be assigned VOP1 and VOP2respectively. Within each VOP, all image data is coded and decodedidentically.

[0008] The VOP encoder for MPEG-4 separately codes shape information andtexture (luminance and chrominance) information for the video object.The shape information is encoded as an alpha map that indicates whetheror not each pixel is part of the video object. The texture informationis coded as luminance and chrominance values. Thus, the VOP encoder forMPEG-4 employs explicit shape coding because the shape information iscoded separately from the texture information (luminance and chrominancevalues for each pixel). While an explicit shape coding technique canprovide excellent results at high bit rates, explicit shape codingrequires additional bandwidth for carrying shape information separatefrom texture information. Moreover, results are unimpressive for theexplicit shape coding at low coding bit rates because significantbandwidth is occupied by explicit shape information, resulting in lowquality texture reconstruction for the object.

[0009] As an alternative to explicitly coding shape information,implicit shape coding techniques have been proposed in which shapeinformation is not explicitly coded. Rather, in implicit shape coding,the shape of each object can be ascertained based on the textureinformation. Implicit shape coding techniques provide a simpler design(less complex than explicit technique) and a reasonable performance,particularly at lower bit rates. Implicit shape coding reduces signalbandwidth because shape information is not explicitly transmitted. As aresult, implicit shape coding can be particularly important for low bitrate applications, such as mobile and other wireless applications.

[0010] However, implicit shape coding generally does not perform as wellas explicit shape coding, particularly for more demanding scenes. Forexample, objects often contain color bleeding artifacts on object edgeswhen using implicit shape coding. Also, it can be difficult to obtainlossless shapes using the implicit techniques because shape codingquality is determined by texture coding quality and is not providedexplicitly. Therefore, a need exists for an improved implicit shapecoding technique.

SUMMARY OF THE INVENTION

[0011] The system of the present invention can include an encodingsystem and a decoding system that overcomes the disadvantages anddrawbacks of prior systems.

[0012] An encoding system uses chroma-key shape coding to implicitlyencode shape information with texture information. The encoding systemincludes a boundary box generator and color replacer, a DCT encoder, aquantizer, a motion estimator/compensator and a variable length coder. Avideo object to be encoded is enclosed by a bounding box and onlymacroblocks in the bounding box are processed to improve datacompression. Each macroblock inside the bounding box is identified aseither 1) outside the object; 2) inside the object; or 3) on the objectboundary. Macroblocks outside the object are not coded to furtherimprove data compression. For boundary macroblocks, pixels locatedoutside the object (background pixels) are replaced with a chroma-keycolor K to implicitly encode the shape of the object. The luminance andchrominance values for macroblocks inside the object and on the objectboundary are coded, including transforming the luminance and chrominancevalues to obtain DCT coefficients, and quantizing (scaling) the DCTcoefficients. Motion compensation can also be performed on somemacroblocks to generate motion vectors. In addition, to improve imagequality, boundary macroblocks can be quantized at a finer level thanother macroblocks in the bounding box. A bitstream is output from theencoding system. The bitstream can include the encoded macroblock pixeldata, a code identifying the position (e.g., inside, outside or on theboundary) of each coded macroblock, the chroma-key value and thresholds,motion vectors and one or more quantizers. Where a finer quantization isapplied to boundary macroblocks, the bitstream also includes a codeindicating the exact quantizer used for boundary macroblocks and a codeindicating the number of quantization levels for macroblocks inside theobject.

[0013] A decoding system includes a variable length decoder, an inversequantizer, a motion compensator, an inverse DCT coder, and colorextractor and shape mask detector. A bitstream is received and decodedby the decoding system to obtain both texture information (e.g.,luminance and chrominance data) and shape information for a videoobject. The shape information is implicitly encoded. DCT coefficientsand motion vectors for each macroblock are inverse quantized (rescaled)based on the codes (quantizers) identifying the specified quantizer orthe specified number of quantization levels for each. The reconstructedvideo object is obtained by passing only the pixel values for the object(e.g., by rejecting pixel values within a predetermined range of thechroma-key). The shape of the video object is obtained by generating abinary map or shape mask (e.g., 1s or 0s) identifying each pixel aseither inside the object or outside the object A gray-scale map (shapemask) can be generated instead by using two thresholds to soften theobject boundaries.

BRIEF DESCRIPTION OF THE DRAWINGS

[0014]FIG. 1(A) illustrates an example frame for coding.

[0015] FIGS. 1(B)-1(D) illustrate the frame of FIG. 1(a) broken intothree Video Object Planes.

[0016]FIG. 2A is a block diagram illustrating an encoding systemaccording to an embodiment of the present invention.

[0017]FIG. 2B is a block diagram of a decoding system according to anembodiment of the present invention.

[0018]FIG. 3 illustrates an example of a bounding box bounding a videoobject according to an embodiment of the present invention.

[0019]FIG. 4 illustrates an example of a video object according to anembodiment of the present invention.

[0020]FIG. 5 is a flow chart illustrating the operation of an encodingsystem according to an embodiment of the present invention.

[0021]FIG. 6 is a flow chart illustrating the operation of a decodingsystem according to an embodiment of the present invention.

DETAILED DESCRIPTION

[0022] Referring to the drawings in detail, wherein like numeralsindicate like elements, FIG. 2A is a block diagram illustrating anencoding system according to an embodiment of the present invention.FIG. 2B is a block diagram of a decoding system according to anembodiment of the present invention.

[0023] Encoding system 202 uses chroma-key shape coding to implicitlyencode shape information. According to the present invention, anencoding system 202 (FIG. 2A) receives a video picture or frameincluding a segmented video object as an input signal over line 204,representative of a VOP to be coded (e.g., includes the object and somebackground). The input signal is sampled and organized into macroblockswhich are spatial areas of each frame. The encoding system 202 codes themacroblocks and outputs an encoded bitstream over a line 220 to achannel 219. The channel 219 may be a radio channel, a computer networkor some storage media such as a memory or a magnetic or optical disk. Adecoding system 230 (FIG. 2B) receives the bitstream over line 228 fromthe channel 219 and reconstructs a video object therefrom for display.

[0024] Encoding system 202 includes a bounding box generator and colorreplacer 206 for generating a bounding box around the segmented videoobject and for replacing pixel values located outside the objectboundary with a predetermined key color (or chroma-key color) K,according to an embodiment of the present invention. The chroma-keycolor and some threshold values are output on line 203 by bounding boxgenerator and color replacer 206. According to an embodiment of thepresent invention, instead of enclosing each video object in a full sizepicture and processing all macroblocks in the received full sizepicture, the video object can advantageously be enclosed by a boundingbox and only macroblocks in the bounding box are processed (e.g., onlypixel data is passed for macroblocks inside the bounding box). Accordingto an embodiment of the present invention, the position of the boundingbox is chosen such that it contains a minimum number of 16 pixel×16pixel macroblocks (while bounding the video object). As a result,processing time is reduced. In this manner, bounding box generator andcolor replacer 206 implicitly encodes information describing the shapeof the video object in the texture (luminance and chrominance values)information for the object. According to an embodiment of the presentinvention, the bounding box generator and color replacer 206 outputssignals on line 201 including texture information (pixel values) for theobject (for pixels inside the object boundary), and outputs thechroma-key pixel value for pixels outside the object boundary (becausethese pixels outside the object boundary were replaced with thechroma-key color).

[0025] The output of generator and color replacer 206 is coupled vialine 201 to a macroblock formatter and mode decider 207. Macroblockformatter and mode decider 207 divides the video object into macroblocks(MBs), determines whether each MB is inside the boundary (of the videoobject), outside the boundary, or on the boundary (e.g., having pixelsinside and pixels outside the boundary of the object), known as themode. The macroblock formatter and mode decider 207 then outputs afirst_shape_code for each macroblock identifying the mode of eachmacroblock.

[0026] In addition, according to an embodiment of the present invention,macroblock formatter and mode decider 207 also operates like a filterbecause it outputs pixel data on line 208 (to be encoded) only formacroblocks that are either inside the boundary or on the boundary(pixel data are not output for macroblocks outside the boundary). Thefirst_shape_code is generated for each macroblock and identifies thosemacroblocks for which no pixel data is transmitted. Thus, datacompression and encoding speed are improved because pixel data formacroblocks outside the boundary will not be encoded and transmitted.

[0027] The pixel data on line 208 (including texture information orpixel values for the pixels inside the object boundary, and the replacedchroma-key values for those pixels outside the object boundary andinside the bounding box) is input to a difference circuit 215 and to amotion estimator/compensator 209. Motion estimator/compensator 209generates a motion predicted signal that is output on line 225.Difference circuit 215 subtracts the motion predicted signal (on line225) and the pixel data (on line 208) to output pixel difference valueson line 213.

[0028] The pixel (image) difference values are input to a DCT encoder210 via line 213. DCT encoder 210 performs a transformation of the imagedata, such as discrete cosine transform (“DCT”) coding or sub-bandcoding, from the pixel values (luminance and chrominance values) to DCTcoefficients (frequency domain). A block of pixels is transformed to anequivalently sized block of DCT coefficients. DCT encoder 210 outputsDCT coefficients (corresponding to the pixel data) on line 212.

[0029] A quantizer 214 is connected via line 212 to DCT encoder 210.Quantizer 214 scales or quantizes the DCT coefficients output on line212 by dividing each coefficient by a predetermined quantizer. Thequantizer is a constant or variable scalar value (Q_(p)). For example,the DCT coefficients can be quantized by dividing each coefficient bythe quantizer (Q_(p)). In general, the quantizer 214 reduces bandwidthof the image signal by reducing a number of quantization levelsavailable for encoding the signal. The quantization process is lossy.Many small DCT coefficients input to the quantizer 214 are divided downand truncated to zero. The scaled signal (scaled or quantized DCTcoefficients) is output from quantizer 214 via line 216.

[0030] Usually, the same quantizer (a VOP_quant) is used to quantize DCTcoefficients for all macroblocks of a VOP. However, according to anembodiment of the present invention, certain macroblocks (e.g. boundarymacroblocks) can be quantized using a smaller quantizer to better definethe boundary or edge of an object. A special quantizer for boundarymacroblocks (indicated by bound_quant) is used for boundary macroblocks.The boundary quantizer is specified by the bound_quant signal, which isoutput on line 217 from quantizer 214.

[0031] An inverse quantizer and inverse DCT encoder (e.g., a DCTdecoder) receives the scaled DCT coefficients via line 216, inversequantizes the DCT coefficients and then converts the DCT coefficients topixel values to generate the pixel difference values, output on line223.

[0032] An adder circuit 224 receives as inputs, the pixel differencesignal via line 223 and the motion predicted signal 225 (from motionestimator/compensator 209). Adder circuit 224 generates an approximatevalue of the input signal (provided on line 208). This approximationsignal, output on line 226, is the current frame data and is input tomotion estimator/compensator 209 to be used as a predictor for the nextframe.

[0033] Motion estimator/compensator 209 performs motionestimation/compensation to output the motion predicted signal on line225 and motion vectors (MV) based on the pixel data input on line 208and the approximation of the pixel data input on line 226. Motionvectors (MV) for one or more macroblocks are output via line 211.

[0034] A variable length coder 218 variable length codes the scaled DCTcoefficients (input on line 216), motion vectors (MVs input on line211), the chroma-key color and thresholds (input on line 203) and thebound_quant value (input on line 217) into a bitstream. The bitstream isoutput via line 220 to channel 219 for transmission.

[0035] Decoding system 230 (FIG. 2B) receives the encoded bitstream fromchannel 219 via line 228. A variable length decoder 232 variable lengthdecodes the encoded bitstream into scaled DCT coefficients (output online 234) for each macroblock, motion vectors for each macroblock (MVsoutput on line 222), the first_shape_codes for each macroblock (outputon line 243), the bound_quant value (output on line 241) and thechroma-key color and threshold values (output on line 245).

[0036] The scaled DCT coefficients are input via line 234 and thebound_quant value is input via line 241 to inverse quantizer 236.Inverse quantizer 236 rescales the DCT coefficients according to thequantizer which is a constant or variable scalar value (Q_(p) orVOP_quant). For example, the coefficients can be rescaled by multiplyingeach coefficient by the quantizer (Q_(p)) (both the standard quantizerQ_(p) and the bound_quant value can be transmitted with the bitstream).Inverse quantizer 236 increases a number of quantization levelsavailable for encoding the signal (e.g., back to the original number ofquantization levels). Inverse quantizer 236 may use one quantizer formacroblocks inside the boundary (e.g., VOP_quant), and a finer quantizer(bound_quant) for boundary macroblocks. The same quantizers used at theencoding system 202 are also used at the decoding system 230.

[0037] Inverse DCT encoder 240 performs an inverse DCT transform on theDCT coefficients received as an input via line 237 to output pixelvalues (luminance and chrominance values) for each macroblock on line246.

[0038] For those macroblocks that were coded using motion compensation,the motion predicted signal provided on line 244 (output from motioncompensator 242) is added by adder circuit 248 to the pixel values online 246 to output the reconstructed pixel values for each macroblock online 251. Motion compensator 242 generates the motion predicted pixelsignal on line 244 based on the reconstructed pixel signal on line 251and the motion vectors for each macroblock received via line 222.

[0039] The reconstructed pixel signal is input via line 251 to a colorextractor and shape mask generator 249. The color extractor and shapemask generator 249 also receives as inputs the chroma-key color andthresholds (via line 245) and the first_shape_code (via line 243). Thecolor extractor and shape mask generator 249 compares each pixel value(in the reconstructed pixel signal) to the chroma-key value (or a rangeof values near the chroma-key color). By comparing the pixel values tothe chroma-key value, the color extractor and shape mask generator 249can determine which pixels are located within the object and whichpixels are located outside of the object and thereby identify theoriginal shape of the object in the VOP. The pixels located within theobject are output via line 252 as a reconstructed video object(stripping off the chroma-key or background pixels to output objectpixel values). Also, color extractor and shape mask generator 249generates and outputs a shape mask identifying the shape of the videoobject. The shape mask can be generated as a binary map (e.g., a 1 or 0for each pixel) or a gray scale map identifying whether each pixel iseither inside or outside the video object. The shape mask is output vialine 254 and can be used, for example, by a compositer to combinemultiple video objects into a single (multi-object) frame.

[0040] The above-described chroma-key shape coding technique of thepresent invention provides a simple and efficient method for video shapecoding. Furthermore, the present invention includes several additionalfeatures and advantages that further improve or refine theabove-described chroma-key shape coding technique without addingunjustifiable overhead or complexity. The present invention can includeone or more of the following features:

[0041] 1. Bounding Box: Process Only Macroblocks Inside the BoundingBox:

[0042] Instead of enclosing each video object in a full size picture andprocessing all macroblocks in the picture, the video object canadvantageously be enclosed by a bounding box and only macroblocks in thebounding box are processed. Prior to calculating a bounding box aroundthe object, the object is first segmented from the video frame. Any ofseveral well-known segmentation techniques can be used to segment thevideo object from the remainder of the video frame. The position of thebounding box is chosen such that it contains a minimum number of 16pixel×16 pixel macroblocks. The encoding/decoding process is performedon a macroblock basis. In this manner, processing time can be reduced,

[0043]FIG. 3 illustrates an example of a bounding box 310 that bounds avideo object 315. The bounding box 310 is divided into a plurality ofmacroblocks 320. (This is similar to the bounding box used in theexplicit shape coding of MPEG-4 Verification Model (VM).) As a result,macroblocks within the bounding box 310 are either 1) inside the object315 (where the macroblock is completely inside the object); 2) outsidethe object 315 (where the macroblock is completely outside the object);or 3) on the object boundary (e.g., the macroblock has both pixel(s)inside the object and pixel(s) outside the object).

[0044]FIG. 4 illustrates an example of a video object according to anembodiment of the present invention. Video object 405 is bounded by abounding box (not shown). The bounding box is divided into a pluralityof macroblocks. Some of the macroblocks are illustrated in FIG. 4. Forexample, macroblocks MB1, MB2 and MB5 are outside the video object 405.Macroblocks MB11, MB12 and MB14-16 are located inside the video object405. Also, macroblocks MB3, MB4, MB6, MB7, MB9 and MB13 are on theobject boundary.

[0045] 2. First_shape_code:

[0046] For Each macroblock in the bounding box, the present inventioncan use a first_shape_code to identify whether the macroblock is:

[0047] a) outside the object;

[0048] b) inside the object; or

[0049] c) on the object boundary. A first_shape_code is transmitted withthe data for each macroblock. (For those macroblocks outside theboundary, only the first_shape_code will be transmitted).

[0050] The first_shape_code can be implemented several different ways.Two examples of a first_shape_code are described below: TABLE 1first_shape_code Macroblock Shape 0 all_0 (outside the object) 1 others(inside or on boundary)

[0051] In Table 1, first_shape_code is a 1 bit code that indicateswhether the macroblock is outside the video object or not. Afirst_shape_code of 0 indicates that the macroblock is outside theobject. A first_shape_code of 1 indicates that the macroblock is eitherinside the object or on the boundary. TABLE 2 first_shape_codeMacroblock Shape 0 boundary 10 all_0 (outside the object) 11 all_255(inside the object)

[0052] In Table 2, the first_shape_code is transmitted as a two bitcode. The two bits can be used to identify whether the macroblock islocated on the boundary, outside the object, or inside the object.

[0053] 3. Within the Bounding Box, Apply Chroma-keying Only to BoundaryMacroblocks: Background Macroblocks are Not Coded:

[0054] For macroblocks on the boundary (identified, for example, by thefirst_shape_code), pixels outside the object (e.g., background pixels)are replaced with the chroma-key color. This chroma-key replacement ofbackground pixels is performed only for boundary macroblocks. Replacingthe background pixels in the boundary macroblocks with the chroma-keyimplicitly codes shape information for the object. Also, macroblocksoutside the object (and within the bounding box) are not coded.

[0055] After chroma-key pixel replacement, only blocks inside the objector on the boundary are coded (e.g., DCT transformed, quantized, andvariable length coded for transmission). By not coding macroblockslocated outside the video object (background macroblocks), a significantnumber of overhead bits can be saved, thereby increasing datacompression.

[0056] In addition, information should be sent identifying thosemacroblocks inside the bounding box and outside the object (and thus,identifying those macroblocks that were not coded). An additional bitcan be added to the first_shape_code to identify those macroblocks thatare within the bounding box but outside the object (identifying thosemacroblocks that are not coded).

[0057] 4. Bound_quant: Use a Finer Quantization for BoundaryMacroblocks:

[0058] To further improve image quality, a finer quantization can beused for the boundary macroblocks, as compared to the quantization forthe other macroblocks in the bounding box. This can be done by quantizer214 (FIG. 2A) scaling or quantizing the DCT coefficients output on line212 according to a smaller quantizer for the boundary macroblocks.Therefore, quantizer 214 uses a larger number of quantization levels(e.g., a smaller quantizer) for the boundary macroblocks resulting infiner quantization of the boundary macroblocks. Because bandwidth islimited, using a larger number of quantization levels (e.g., a quantizerless than 1) for the boundary macroblocks allocates or apportions alarger number of the available bits (bandwidth) to boundary macroblocksto better define the outer edge or boundary of the video object.

[0059] According to an embodiment of the present invention, a finerquantization for boundary blocks can be specified through a boundaryquantization code (bound_quant). In MPEG-4 VM, a VOP quantization code(VOP_quant) is a five-bit code that specifies the quantization for theVOP. In MPEG-4, DCT coefficients are divided by the VOP quantizationcode. According to the present invention, the background macroblockswithin the bounding box are not coded. Therefore, according to anembodiment of the present invention, VOP_quant specifies the number ofquantization levels for macroblocks inside the object and bound_quantspecifies the number of quantization levels for boundary macroblocks.

[0060] According to an embodiment of the present invention, thebound_quant code can be used to specify the level of quantization forboundary macroblocks relative to the level of quantization for the othermacroblocks, as follows: TABLE 3 bound_quant times VOP_quant 00 ½ 01 ⅝10 ⅞ 11 1

[0061] In Table 3, a bound_quant code indicates the quantizationparameter for boundary macroblocks as compared to the quantizationparameter of other macroblocks. For example, a bound_quant of 11indicates that the quantization parameter for boundary macroblocks isthe same as (one times) the quantization parameter for other macroblocksin the bounding box (the VOP_quant). This indicates that there are thesame quantization parameter for the boundary macroblocks as for othermacroblocks.

[0062] A bound_quant code of 00 similarly indicates that thequantization parameter is one half for the boundary macroblocks as forother macroblocks resulting in finer quantization of the boundarymacroblocks. Other values for the bound_quant code specify various othernumber of quantization parameters for boundary macroblocks. Othertechniques can be used to specify an increased number of quantizationlevels (finer quantization) for boundary macroblocks (as compared toother macroblocks).

[0063] 5. Choice of Chroma-key Color:

[0064] Although the choice of key color is an encoding issue, it has thepotential of causing shape degradation due to potential color leakage ifsaturated colors are used. (Saturation is the degree of purity of acolor; for example, a pure spectral color having a single wavelength hasa saturation of 100%, while white light has a saturation of zero). Onthe other hand, use of a saturated color improves shape recovery becausenatural scenes do not often contain such colors. However, the onlyrestriction for chroma-keying is that the chroma-key color does notexist in the scene. The use of less saturated colors has beeninvestigated, similar to the ones used in studio environments forchroma-keying of scenes.

[0065] A relatively saturated color can be used, such as Y=50, Cb=200,Cr=100. However, weaker colors (less saturated) can be used to reducethe potential for shape distortion due to color bleeding. According toan embodiment of the present invention, an example of a less saturatedcolor that can be used to decrease the potential for color bleeding isY=135, Cb=160, Cr=110. Other less saturated colors can be similarly usedas the chroma-key color to decrease the potential for shape distortion.For notational simplicity, instead of using Cb and Cr, the notations ofU and V, respectively, will be used in the remainder of this application(although strictly speaking Cb and Cr differ from U and V by a smallscaling factor and an offset).

[0066]FIG. 5 is a flow chart illustrating the operation of an encodingsystem according to an embodiment of the present invention.

[0067] At step 510 a video frame is received and a video object issegmented from the remainder of the video frame. One of several wellknown techniques can be used to segment the object.

[0068] At step 515, a bounding box is created around the video object(VOP). The position of the bounding box is chosen such that it containsa minimum number of 16 pixel×16 pixel macroblocks. Other sizemacroblocks can be used. Processing is performed on a macroblock basis.

[0069] At step 520, each background pixel (pixels outside the object) isreplaced with the chroma-key color K. This can be performed for allpixels in the picture or frame or performed only for boundarymacroblocks.

[0070] At step 525, within the bounding box, each macroblock formattedand is identified as either: 1) outside the video object (a backgroundmacroblock); 2) inside the object; or 3) on the object boundary. A codefor each macroblock, such as the first_shape_code, is used to identifythe position of each macroblock (inside, outside or on the objectboundary).

[0071] At step 530, motion compensation is performed on at least some ofthe boundary or inside macroblocks, including calculating motionvectors. Motion vectors are calculated only for those macroblocks codedwith motion compensation.

[0072] At step 535, the luminance and chrominance (pixel) values forboundary macroblocks and macroblocks located inside the object arecoded. According to the present invention, macroblocks outside theobject (e.g., background macroblocks) are not coded. Thus, in the eventthat all pixels (including pixels outside the bounding box) werereplaced with the chroma-key at step 520, these replaced pixels locatedoutside the bounding box are simply discarded (but the first_shape_codesindicate which macroblocks have no data transmitted for them). Codingincludes DCT transforming the luminance and chrominance values for themacroblocks to obtain DCT coefficients, and then quantizing (scaling)the DCT coefficients. The motion vectors and the scaled DCT coefficientsare then variable length coded. The steps of DCT transforming,quantizing (generally), performing motion compensation, calculatingmotion vectors and variable length coding can be performed, for example,in a manner similar to that set forth in MPEG-4 VM 5.0. According to anembodiment of the present invention, boundary macroblocks can bequantized using finer quantization than macroblocks inside the object.

[0073] At step 540, a coded bit stream is output from the encodingsystem to the channel. The bit stream includes the transformed andquantized (scaled) luminance and chrominance data for each codedmacroblock, motion vectors, codes (such as the first_shape_code)identifying the position or mode (e.g., inside, outside or on theboundary) of each macroblock, a code (such as the VOP_quant code)indicating the level of quantization for macroblocks located inside theobject and a code (such as the bound_quant code) indicating the relativelevel of quantization for boundary macroblocks (if different), motionvectors, and the chroma-key and threshold values. The bit stream canalso include additional information. For boundary macroblocks, pixelslocated outside the object have been replaced with the chroma-key colorso as to implicitly code the shape of the object within the textureinformation (luminance and chrominance data) for the object.

[0074] To reduce overhead and improve data compression, macroblockslocated outside the object (e.g., background macroblocks) are not coded,and the chroma key is applied to background pixels only for boundarymacroblocks. In addition, a finer quantization can be used for boundarymacroblocks to improve image quality.

[0075]FIG. 6 is a flow chart illustrating the operation of a decodingsystem according to an embodiment of the present invention.

[0076] At step 610, the bit stream is received from the channel, and thevariable length codes are decoded to obtain the scaled DCT coefficients,motion vectors (MVs), codes identifying the location or mode ofmacroblocks (e.g., first_shape_code), quantizers (e.g., VOP_quant,bound_quant), and chroma-key color and thresholds. Image data is notprovided for the identified background macroblocks.

[0077] At step 615, the data (including DCT coefficients and motionvectors) for each macroblock is inverse quantized (rescaled) based onthe bound_quant code (for boundary macroblocks) and the VOP_quant code(for macroblocks inside the object).

[0078] At step 620, the DCT coefficients are inverse DCT transformed,and motion compensation is performed based on the motion vectors (forthose macroblocks coded with motion compensation) to generate motioncompensated luminance and chrominance pixel values for macroblocksinside the object and on the object boundary. This can be performed, forexample, as specified by MPEG-4 VM.

[0079] At step 622, the chroma-key and thresholds (described in greaterdetail in the example below) are decoded.

[0080] At step 625, the reconstructed video object is recovered, and theshape of the object is recovered. The reconstructed video object can berecovered by passing only pixel values that are not equal to thechroma-key color (or not within a small range of the chroma-key color.This passes only the object pixel data.

[0081] Object shape information can be recovered by generating a shapemask or a segmentation map, indicating which pixels are part of theobject, and which pixels are not. According to an embodiment of thepresent invention the segmentation map can be generated as a binarysegmentation map. The binary segmentation map can be generated bydetermining whether or not each pixel value is near the chroma-key valueK. If a pixel is near the chroma-key value (e.g., within a threshold Tof the chroma-key value), then the pixel is not included in therecovered video object or frame. If the pixel is not near the chroma keyvalue (e.g., the pixel value is not within a threshold of the chroma-keyvalue), then the pixel is included in the recovered video object(considered foreground). The video object has the shape indicated by thebinary segmentation map and a texture (luminance and chrominance values)indicated by those decoded pixel values which are not near thechroma-key value. If the first_shape_code indicates which macroblocksare on the object boundary, then color extraction (e.g., comparison ofthe pixel to the chroma-key to determine if the pixel is inside oroutside the boundary) need only be performed for boundary macroblocks toobtain a binary map identifying the shape of the object.

[0082] One problem with the use of a single threshold T to generate abinary segmentation map at the decoder for chroma-keying is that thesharp boundary condition can cause a rough or jagged edge for the objectboundary. Instead of a binary map as described above, the segmentationmap can have gray-level values to create softer boundaries. In computergraphics or in blue-screen movies, alias-free natural looking boundariescan be generated using two thresholds instead of one at the boundaryregions.

[0083] According to another embodiment of the present invention, insteadof using a single threshold T at the decoding system, two thresholds T₁and T₂ can be used. The region between T₁ and T₂ is the boundary. Avalue of 0 indicates background and a value of 255 indicates foreground(the object), assuming 8 bits of coding per pixel (merely as anexample). Note that T₁ affects the amount of background while T2 affectsthe amount of foreground. If T₂ is too high, part of the foreground willbe too high. If T₁ is too low, part of the background will be includedin the object, and hence introduce artifacts. On the other hand, if T₁and T₂ are too close to each other, then the object boundary becomesharder (losing the advantages of boundary softening). The tradeoffsamong these factors can be used to select the best thresholds for aparticular application. For example, human interaction and subjectivedeterminations can be used at the encoding system to select thethresholds T₁ and T₂. T₁ can be set equal to T₂ to create the stepfunction or sharp boundary condition provided by the binary segmentationmap.

[0084] Using two thresholds T₁ and T₂, the shape information can berecovered from the reconstructed texture information as follows:

[0085] 1) Calculate an alpha value for a decoded pixel (X) by either oftwo methods:

[0086] Method 1: d (K_(Y)−X_(Y))²+(K_(U)−X_(U))²+(K_(V)−X_(V))²; defaultmethod

[0087] Method 2: d₁=|K_(Y)−X_(Y)|+|K_(U)−X_(U)|+|K_(V)−X_(V); alternatemethod.

[0088] If method 2 is employed ‘d₁’ needs to be multiplied by a scalingfactor (≦d) to fit the same range as ‘d’ computed by method 1, withrespect to which thresholds T₁ and T₂ are sent.

[0089] 2) The alpha value (α) for each pixel is a function of distance dbetween the reconstructed YUV values of pixel X and the key color K:

[0090] if (d<T₁) then α=0;

[0091] else if (T₁<d<T₂) then α=(d−T₁)/(T₂−T₁)×255;

[0092] else if (d>T₂) then α=255.

[0093] The values T₁ and T₂ are set-by the encoder (assuming method 1for computing d) and sent to the decoder as side information. Accordingto an embodiment of the present invention, α can denote the transparencyof a pixel, where α being 255 indicates that the object is opaque, and αbeing 0 indicates that he pixel is transparent. The resulting value fora pixel that has an α somewhere between 0 and 255 is semi-transparentand is a weighted combination of the pixel value in the current pictureand the pixel value from a background picture that is specifiedexternally or in advance. This allows a smoothing or blending functionto be performed at object boundaries. Thus, the resulting pixel valuefor each component (Y, U and V) can be calculated as:

[α·X+(255−α)·Z]/255

[0094] where X is the decoded pixel component value (X_(Y), X_(U) orX_(V)), and Z is the pixel component value (Z_(Y), Z_(U) or Z_(V)) foreach component of the background picture. This calculation should beperformed for each component value (Y, U, V).

[0095] The system of the present invention can include an encodingsystem 202 and a decoding system 230. Encoding system 202 useschroma-key shape coding to implicitly encode shape information. Encodingsystem 202 includes a bounding box generator and color replacer 206, amacroblock formatter and mode decider 207, a DCT encoder 210, aquantizer 214, a motion estimator/compensator 209 and a variable lengthcoder 218. A video object to be encoded is enclosed by a bounding boxand only macroblocks in the bounding box are processed. The position ofthe bounding box is chosen such that it contains a minimum number ofmacroblocks. The encoding/decoding process is performed macroblock bymacroblock. To increase data compression, macroblocks outside thebounding box are not coded.

[0096] A code can be used to identify each macroblock inside thebounding box as either 1) outside the object; 2) inside the object; or3) on the object boundary. For boundary macroblocks, pixels locatedoutside the object (e.g., background pixels) are replaced with achroma-key color K to implicitly encode the shape of the object. Theluminance and chrominance values for macroblocks inside the object andon the boundary are coded. Coding includes, for example, transformingthe luminance and chrominance values to obtain DCT coefficients, andquantizing (scaling) the DCT coefficients. Motion compensation can alsobe performed on macroblocks to generate motion vectors. In addition,boundary macroblocks can be quantized at a finer level to improve imagequality. A bitstream is output from encoding system 202. The bitstreamincludes the transformed and quantized (scaled) luminance andchrominance data for each coded macroblock, motion vectors, codes (suchas the first_shape_code) identifying the position (e.g., inside, outsideor on the boundary), a quantizer code (such as the VOP_quant code)indicating the number of quantization levels for macroblocks locatedinside the object and a quantizer code (such as the bound_quant code)indicating the number of quantization levels for boundary macroblocks(if different).

[0097] Decoding system 230 includes a variable length decoder 232, aninverse quantizer 236, a motion compensator 242, an inverse DCT 240, anda color extractor and shape mask generator 249. A bitstream is receivedand decoded by decoding system 230 is used to obtain both textureinformation (e.g., luminance and chrominance data) and shape informationfor a video object. The shape information is implicitly encoded. DCTcoefficients and motion vectors for each macroblock are requantized(rescaled) based on the bound_quant code (for boundary macroblocks) andthe VOP_quant code (for macroblocks inside the object). Motioncompensated luminance and chrominance values are generated based on themotion vectors. A color extractor and shape mask generator 249reconstructs the video object by passing only pixel values that aredifferent from the chroma-key color, and generates a shape mask(identifying the shape of the object), also by comparing pixel values tothe chroma-key color. These two processes can be performed together. Theshape of the object (and thus, an identification of the object itself)can be determined by comparing each pixel value with the chroma-keyvalue K. If a pixel is within a predetermined threshold of thechroma-key value, the pixel is not included in the recovered videoobject or frame (rather, it is considered background). If the pixel isnot within a threshold of the chroma-key value, then the pixel isincluded in the recovered video object (considered foreground). Theshape of the video object is thus recovered (e.g., by generating abinary shape mask at the decoding system based on the pixel valuecomparison). For example, the binary shape mask can be generated as 1 sfor object data and 0 s for the other (background) pixels. The textureof the object is recovered as the decoded luminance and chrominancevalues of the object (e.g., pixel values outside the threshold of thechroma-key value are output as texture data of the object). Also, agray-scale segmentation map can be generated using two thresholds tosoften the object boundaries.

What is claimed is:
 1. A method of implicitly encoding shape informationfor a video object, comprising the steps of: receiving a video frame,including a video object; creating a box bounding the video object, thebounding box divided into a plurality of macroblocks, each macroblockcomprising a plurality of chrominance and luminance pixels; identifyingwhich macroblocks are inside the object or on the object boundary; foreach boundary macroblock, replacing each pixel outside the object with akey color; for boundary macroblocks and macroblocks inside the object,computing luminance and chrominance pixel difference values bysubtracting motion compensated prediction signals from the correspondingluminance and chrominance pixel values; for boundary macroblocks andmacroblocks inside the object, transforming the luminance andchrominance pixel difference values to frequency domain coefficients;scaling the coefficients for macroblocks inside the object using a firstquantizer; scaling the coefficients for boundary macroblocks using asecond quantizer to provide a finer level of quantization for saidboundary macroblocks as compared to said macroblocks inside the object;and outputting a bitstream including the scaled coefficients andinformation identifying the quantizers.
 2. A method of implicitlyencoding shape information for a video object comprising the steps of:receiving a video frame, including a video object; creating the tightestbox bounding the video object, extending the box in horizontal andvertical directions to fit the next integer number of macroblocks ineach direction, the extended bounding box divided into a plurality ofmacroblocks, each macroblock comprising a 16×16 array of luminancepixels in the form of 4, 8×8 blocks and the corresponding chrominancepixels; identifying which macroblocks are inside the object or on theobject boundary; for each boundary macroblock, replacing each pixeloutside the object with a key color; for boundary macroblocks andmacroblocks inside the object, computing luminance and chrominance pixeldifference values by subtracting motion compensated prediction signalsfrom the corresponding luminance and chrominance pixel values; forboundary macroblocks and macroblocks inside the object, transforming theluminance and chrominance pixel difference values to frequency domaincoefficients; scaling the coefficients for macroblocks inside the objectusing a first quantizer; scaling the coefficients for boundarymacroblocks using a second quantizer, wherein the second quantizer issmaller than or equal to the first quantizer to provide a finer level ofquantization for said boundary macroblocks; and outputting a bitstreamincluding the scaled coefficients and information identifying thequantizers.
 3. The method of claim 1 wherein the key color is chosen tobe from among the less saturated colors and the key color does not existin the object.
 4. The method of claim 1 wherein said bitstream furthercomprises a first_shape_code provided for at least some of themacroblocks and efficiently identifying which of the macroblocks areinside the object and identifying which macroblocks are outside theobject.
 5. The method of claim 1 wherein said bitstream furthercomprises a first_shape_code provided for each macroblock andefficiently identifying which of the macroblocks are inside the object,outside the object or on the boundary of the object.
 6. The method ofclaim 1 and further comprising the step of variable length coding thescaled coefficients and said information.
 7. The method of claim 1wherein said bitstream comprises coded motion vectors, transformed andscaled luminance and chrominance pixel difference values, and codesindicating the quantizers for boundary macroblocks and other macroblocksinside the bounding box, and an identification of the macroblocksoutside the object.
 8. The method of claim 1 wherein said step oftransforming comprises the step of discrete cosine transform (DCT)transforming the luminance and chrominance values to DCT coefficientsfor boundary macroblocks and macroblocks inside the object.
 9. Themethod of claim 1 wherein: said step of scaling the coefficients formacroblocks inside the object using a first quantizer comprises the stepof dividing the coefficients for macroblocks inside the object by thefirst quantizer; and said step of scaling the coefficients for boundarymacroblocks using a second quantizer comprises the step of dividing thecoefficients for boundary macroblocks by the second quantizer, whereinthe second quantizer is less than or equal to the first quantizer.
 10. Amethod of decoding a video bitstream in which the shape of a videoobject has been implicitly encoded, comprising the steps of: receiving abitstream representing a video object, the bitstream including scaledfrequency domain coefficients for each of a plurality of macroblocksinside the object or on the object boundary; rescaling the coefficientsfor macroblocks inside the object using a first quantizer; rescaling thecoefficients for macroblocks on the object boundary using a secondquantizer wherein the second quantizer is smaller than or equal to thefirst quantizer; inverse transforming the frequency domain coefficientsto obtain luminance and chrominance pixel difference values; adding aprediction signal generated by a motion compensator to the luminance andchrominance pixel difference values to obtain the luminance andchrominance pixel values of a reconstructed video object; and recoveringthe approximate shape of the object by analyzing the luminance andchrominance values of at least the boundary macroblocks of thereconstructed video object.
 11. The method of claim 10 wherein eachmacroblock comprises a 16×16 array of luminance pixels in the form of 4,8×8 blocks and the corresponding chrominance pixels.
 12. The method ofclaim 10 wherein said step of inverse transforming comprises the step ofinverse discrete cosine transform (DCT) transforming the frequencydomain coefficients to obtain the luminance and chrominance pixeldifference values.
 13. The method of claim 10 wherein said step ofrecovering the approximate shape of the object comprises the followingsteps: decoding the chroma-key value and a threshold from the bitstream;comparing each pixel value of the boundary macroblocks of thereconstructed object to the chroma-key value; if the pixel value iswithin a threshold of the chroma-key value, then the pixel is notincluded in the recovered video object; if the pixel is not within thepredetermined threshold of the chroma-key value, then the pixel isincluded in the recovered video object.
 14. The method of claim 10wherein said step of recovering the approximate shape of the objectcomprises the following steps: decoding the chroma-key value and firstand second thresholds T₁ and T₂ from the bitstream; calculating an alphamap based on the pixel luminance and chrominance pixel values of thereconstructed object, the chroma-key color and the first and secondthresholds; and applying the alpha map to the pixel luminance andchrominance pixel values to obtain final luminance and chrominancevalues.
 15. The method of claim 14 wherein said step of calculating analpha map comprises the following steps applied either to objectboundary macroblocks or to object boundary as well as inside the objectmacroblocks: A) Calculate an alpha value for a decoded pixel (X) byfirst computing the distortion measure: d=(K _(Y) −Y _(Y))²+(K _(U) −X_(U))²+(K _(V) −X _(V))²; wherein K_(Y), K_(U) and K_(V) representluminance and chrominance values for the chroma-key color K, and whereinX_(Y), X_(U) and X_(V) represent luminance and chrominance values for apixel.
 16. The method of claim 14 wherein said step of calculating analpha map comprises the following steps applied either to objectboundary macroblocks or to object boundary as well as inside the objectmacroblocks: A) Calculate an alpha value for a decoded pixel (X) byfirst computing the distortion measure: d ₁ =|K _(Y) −X _(Y) |+|K _(U)−X _(U) |+|K _(V) −X _(V)|; wherein K_(Y), K_(U) and K_(V) representluminance and chrominance values for the chroma-key color K, and whereinX_(Y), X_(U) and X_(V) represent luminance and chrominance values for apixel; and multiply d₁ by a scaling factor.
 17. The method of claim 15wherein said step of applying comprises the steps of: B) calculate thealpha value (a) for each pixel in the said macroblocks as a function ofdistance d between the reconstructed pixel luminance and chrominancevalues (YUV) and the chroma-key color K (using K_(y), K_(u), K_(v) andthresholds T₁ and T₂) if (d<T₁) then α=0; else if (T₁<d<T₂) thenα=(d−T₁)/(T₂−T₁)×255; else if (d>T₂) then α=255; and assigning α=0 topixels of macroblocks outside the object and α=255 to pixels ofmacroblocks inside the object if not already assigned a value by aboveequations; and C) calculate the final pixel luminance and chrominancevalues for the reconstructed object as follows: pixelvalue=[α·X+(255−α)·Z]/255 wherein Z is the corresponding backgroundpixel.
 18. The method of claim 10 wherein: said step of rescaling thetransformed coefficients for macroblocks inside the object using a firstquantizer comprises the step of multiplying the transformed coefficientsby the first quantizer; and said step of rescaling the transformedcoefficients for macroblocks on the object boundary using a secondquantizer comprises the step of multiplying the transformed coefficientsfor boundary macroblocks by the second quantizer.